Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMHP: slurm exporter to report gpu metrics #181

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

verdimrc
Copy link
Contributor

@verdimrc verdimrc commented Mar 6, 2024

Issue #, if available: N/A

Description of changes: Prometheus Slurm exporter to report GPU metrics (total, allocated).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sean-smith
Copy link
Contributor

Nice!

@@ -10,7 +10,7 @@ if sudo systemctl is-active --quiet slurmctld; then
echo "Go is already installed."
fi
echo "This was identified as the controller node because Slurmctld is running. Begining SLURM Exporter Installation"
git clone -b 0.20 https://github.com/vpenso/prometheus-slurm-exporter.git
git clone -b development https://github.com/vpenso/prometheus-slurm-exporter.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set a tag, not dev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-gpus-acct throws error with v0.20, contrary to documentation (>=0.19). A few issues link this to Slurm version. The development branch works and I can pin to specific commit.

If development branch (pin or not) is not preferred, need to test if main branch works. Otherwise, it's either no -gpus-acct or pin to the head of development branch (latest commit was two years ago anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you raise an issue on https://github.com/vpenso/prometheus-slurm-exporter to get another release cut?

@KeitaW KeitaW force-pushed the smhp-slurm-exporter-gpu branch 2 times, most recently from e48c186 to b4a4395 Compare June 4, 2024 02:26
@KeitaW KeitaW force-pushed the main branch 3 times, most recently from 44e448e to 1209815 Compare June 4, 2024 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants